Python for Clinical Study Reports and Submission

R/Pharma 2025 Workshop

Yilong Zhang, Nan Xiao

2025-11-07

Welcome

Outline

Four parts of this workshop:

  1. Python environment setup (Nan)
    Use uv to create and manage reproducible Python projects. Develop and collaborate in GitHub Codespaces, Visual Studio Code, or Positron.

  2. Python packages for clinical reporting (Yilong)
    A guided tour of essential packages such as polars, plotnine, and rtflite, with demonstrations of creating TLFs commonly used in clinical trials.

  3. Manage clinical trial A&R projects (Yilong)
    Practical project structure, conventions, and execution from data to deliverables.

  4. Prepare eCTD submission packages (Nan)
    An example workflow for assembling submission-ready source code and outputs using py-pkglite, aligned with eCTD requirements.

Disclaimer

The views and opinions expressed in this presentation are those of the individual presenters and do not represent those of their affiliated organizations or institutions.

Training objective

With Python, learning how to:

  • Create tables for clinical study reports
  • Organize clinical development projects effectively
  • Prepare eCTD submission packages to regulatory agencies

Note

The toolchain, process, and formats may be different in different organizations. We only provide one common way to address them.

Note

Interested in R? check https://r4csr.org/

Acknowledgements

  • R / Pharma organizers

    • It is a fun and productive annual gathering
    • Please consider sharing stories and use cases to expand the community
  • Team members from Meta Platforms and Merck & Co., Inc., Rahway, NJ, USA

  • Contributors of pycsr and r4csr training materials

    • Please consider submitting issues or PR in the repo

Preparation

In this workshop, we assume you have basic Python programming experience and clinical development knowledge.

  • Data manipulation: polars, plotnine, rtflite.
  • ADaM data: adsl, adae, etc.

Resource

  • Training material: https://pycsr.org/

  • During the workshop, we will use the pycsr project

    • Project link will be shared in chat
    • Post questions in group chat

Philosophy

We share the same philosophy described in Section 1.1 of the R Packages book and quote here.

  • “Anything that can be automated, should be automated.”
  • “Do as little as possible by hand. Do as much as possible with functions.”
  • “The goal is to spend your time thinking about what you want to do rather than thinking about the minutiae of package structure.”

Python environment setup

Development environments

Three recommended options:

GitHub Codespaces

  • Cloud-based, pre-configured
  • No local setup needed
  • 120 free hours/month

Positron

  • Posit’s next-gen IDE
  • Native notebook support
  • Built-in data viewer

VS Code

  • Most popular choice
  • Rich extension ecosystem
  • Essential extensions: Python, Pylance, Ruff, Quarto

Why uv?

uv is a modern Python package and project manager written in Rust.

Replaces scattered toolchain:

  • pip + venv + pyenv + pip-tools + setuptools

Benefits:

  • Fast: 10-100x faster than pip
  • Complete: Manages Python versions, dependencies, builds
  • Modern: Uses pyproject.toml as single source of truth
  • Reliable: Automatic dependency resolution and lock files

Installing uv

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Verify:

uv --version

Quick start with uv

# Create new project
uv init pycsr-example
cd pycsr-example

# Pin Python version
uv python pin 3.13.9

# Add dependencies
uv add polars plotnine rtflite

# Add dev dependencies
uv add --dev ruff pytest mypy

# Sync environment
uv sync

Python toolchain essentials

Ruff - Code formatting and linting

uv run ruff format .
uv run ruff check .

mypy - Type checking

uv run mypy src/

pytest - Testing framework

uv run pytest tests/

All configured in pyproject.toml.

Key concepts

Virtual environments are mandatory in Python

  • Isolate project dependencies
  • Prevent conflicts
  • Enable reproducibility

Dependency locking

  • uv.lock pins exact versions
  • Ensures reproducible environments
  • Similar to R’s renv.lock

.python-version file

  • Specifies exact Python version (e.g., 3.13.9)
  • Critical for regulatory submissions

Delivering TLFs in CSR

ICH E3 guidance

The ICH E3: structure and content of clinical study reports provide guidance to assist sponsors in the development of a CSR.

In a CSR, most of TLFs are located in:

  • Section 10: Study patients
  • Section 11: Efficacy evaluation
  • Section 12: Safety evaluation
  • Section 14: Tables, Figures and Graphs referred to but not included in the text
  • Section 16: Appendices

Datasets

Tools

  • polars: Python package for data manipulation similar to dplyr/tidyr R packages

  • rtflite: Python package for creating production-ready tables and figures in RTF format similar to R package r2rtf

polars introduction

TBD

rtflite introduction

Motivation

In the pharmaceutical industry, RTF/Microsoft Word play a central role in preparing clinical study reports

Different organizations can have different table standards

  • For example, table layout, font size, border type, footnote, data source

  • rtflite is a Python package to create production-ready tables and figures in RTF format.

rtflite is designed to:

  • Provide simple Python classes that map to table elements (title, headers, body, footnotes) for intuitive table construction.
  • Offer a canonical Python API with a clear, composable interface.
  • Focus exclusively on table formatting and layout, leaving data manipulation to dataframe libraries like polars or pandas.
  • Minimize external dependencies for maximum portability and reliability.

Workflow

Before creating an RTF table, we need to:

  • Figure out table layout.

  • Split the layout into small tasks in the form of a computer program.

  • Execute the program.

Minimal example

TBD

Package overview

rtflite package provides the flexibility to customize table appearance for

  • Table component: title, column header, footnote, etc.
  • Table cell style: size, border type, color, font size, text color, alignment, etc.
  • Flexible control: the specification of the cell style can be row or column vectorized.
  • Complicated format: pagination, section grouping, multiple table concatenations, etc.

rtflite package also provides the flexibility to convert figures in RTF format.

Simple example: adverse events

rtflite only focus on table format. Data manipulation and analysis should be handled by other python libraries.

Function summary

rtflite provides simple simple Python classes that map to table elements. The goal is to help you translate data frame to tables in RTF file.

add code example

Function illustration

add figure

Break and/or exercise (5 min)

CSR examples

Disposition table

https://pycsr.org/tlf-disposition.html

Analysis population

https://pycsr.org/tlf-population.html

Baseline characteristics

https://pycsr.org/tlf-baseline.html

Efficacy table

https://pycsr.org/tlf-efficacy-ancova.html

AE Summary table

https://pycsr.org/tlf-ae-summary.html

Specific AE table

https://pycsr.org/tlf-ae-specific.html

Break (5 min)

Analysis package

What is an analysis package?

A Python package designed specifically to organize analysis scripts and code for a clinical trial project.

Purpose:

  • Project containers for clinical trial deliverables
  • Reproducible environments for analyses
  • Submission-ready structures for regulatory review

Combines:

  • Python package structure (code organization)
  • Quarto project (report generation)
  • Regulatory requirements (eCTD submission)

Package structure

demo-py-esub/
├── pyproject.toml          # Project metadata
├── .python-version         # Python version
├── uv.lock                 # Locked dependencies
├── src/demo001/            # Study-specific code
│   ├── __init__.py
│   └── utils.py
├── analysis/               # Quarto analysis docs
│   └── tlf-*.qmd
├── data/                   # ADaM datasets
├── output/                 # Generated TLFs
└── tests/                  # Validation tests

See: https://pycsr.org/pkg-structure.html

Benefits

Consistency

  • Standard structure across projects
  • Team knows where files belong

Reproducibility

  • uv.lock pins dependencies
  • .python-version specifies Python

Automation

  • uv sync restores environment
  • quarto render generates outputs
  • pytest validates code

Compliance

  • Built-in documentation
  • Testing infrastructure
  • Standard structure

Git-centric workflow

Core principle: All project assets in version control.

Plain text workflow:

  • .qmd files for analysis (not .ipynb for final deliverables)
  • .md files for documentation
  • .toml files for configuration
  • Avoid .xlsx files for tracking

Project tracking:

  • Issues for requirements
  • Pull requests for review
  • Project boards (Kanban)

See: https://pycsr.org/pkg-management.html

Development lifecycle

Planning:

  • Define TLFs from SAP
  • Create mock tables
  • Assign validation levels
  • Lock Python version and package repo

Development:

  • Create feature branches
  • Implement in analysis/ and src/
  • Self-test against mocks
  • Open pull requests

Validation:

  • Independent review
  • Write unit tests in tests/
  • Run automated checks (ruff, mypy, pytest)

Delivery:

  • Generate all outputs with quarto render
  • Prepare submission package

Break (5 min)

eCTD submission

FDA requirements

FDA Study Data Technical Conformance Guide Section 4.1.2.10:

Submit programs for primary and secondary efficacy analyses. Specify software in ADRG. Use ASCII text format. No executable extensions.

Goal: Enable reviewers to understand and confirm analysis algorithms.

See: https://pycsr.org/submission-overview.html

eCTD Module 5 structure

m5/datasets/<study-id>/analysis/adam/
├── datasets/
│   ├── *.xpt               # ADaM datasets
│   ├── define.xml
│   ├── adrg.pdf            # Instructions
│   └── analysis-results-metadata.pdf
└── programs/
    ├── py0pkgs.txt         # Packed Python package
    ├── tlf-01-*.txt        # Analysis programs
    └── tlf-02-*.txt

Key: All files in programs/ must be ASCII text.

The solution: pkglite for Python

Packs Python projects into portable text files.

Why needed:

  • Python packages have directory structure
  • May contain binary files
  • FDA requires ASCII text format

pkglite capabilities:

  • Pack entire project into single .txt file
  • Preserve file paths and metadata
  • Unpack to restore original structure
  • Support multiple packages in one file

Documentation: https://pharmaverse.github.io/py-pkglite/

Packing workflow

1. Create .pkgliteignore

uvx pkglite use demo-py-esub/

2. Pack the package

uvx pkglite pack demo-py-esub/ \
  -o programs/py0pkgs.txt

3. Convert Quarto to Python scripts

  • Render .qmd -> verify it works
  • Convert .qmd -> .ipynb -> .py
  • Clean and format with ruff
  • Save as .txt (no .py extension)

See: https://pycsr.org/submission-package.html

Packed file format

Human-readable Debian Control File (DCF) format:

# Generated by py-pkglite
# Use `pkglite unpack` to restore

Package: demo-py-esub
File: pyproject.toml
Format: text
Content:
  [project]
  name = "demo001"
  version = "0.1.0"
  ...

Reviewers can read without special tools.

Updating ADRG

Document the Python environment:

Python environment:

Software Version Description
Python 3.13.9 Programming language
uv 0.9.7 Package manager

Packages:

Package Version Description
polars 1.35.1 Data manipulation
rtflite 1.0.2 RTF generation
demo001 0.1.0 Study functions

Appendix: Step-by-step reproduction instructions.

Dry run testing

Essential: Simulate reviewer experience before submission.

Workflow:

  1. Create clean directory
  2. Copy submission materials
  3. Unpack package: uvx pkglite unpack programs/py0pkgs.txt -o .
  4. Install dependencies: cd demo-py-esub && uv sync
  5. Run programs: python ../programs/tlf-*.txt
  6. Verify outputs match originals

Catches: Missing dependencies, path errors, platform issues.

See: https://pycsr.org/submission-dryrun.html

Demo repositories

Analysis package: https://github.com/elong0527/demo-py-esub

Submission package: https://github.com/elong0527/demo-py-ectd

Clone and explore to see complete examples.

Q&A

Resources

Book:

Regulatory:

Technical: